Prof. Matthieu Bloch
Thursday, August 31, 2023
Let \(f:\bbR\to\bbR\) be a convex function such that \(f(1)=0\). The \(f\)-divergence between two distributions \(P,Q \in\Delta(\calX)\) is
\[\begin{align*} \bbD_f(P\Vert Q)\eqdef \sum_{x\in\calX}f\left(\frac{P(x)}{Q(x)}\right)Q(x), \end{align*}\]
with the convention that \(\bbD_f(P\Vert Q)=\infty\) if \(P\) is not absolutely continuous with respect to \(Q\).
For any \(P,Q \in \Delta(\calX)\) \(\bbD_f(P\Vert Q)\geq 0\). If \(f\) is strictly convex, equality holds if and only if \(P=Q\).
A function \(W:\calX\times\calY:(x,y)\mapsto W(y|x)\) is a Markov kernel (also called transition probability) from \(\calX\) to \(\calY\) if for all \(x,y\in\calX\times\calY\) \(W(y|x)\geq 0\) and for all \(x\in\calX\) \(\sum_{y\in\calY}W(y|x)=1\).
Let \(P,Q\in\Delta(\calX)\) and let \(W\) be a Markov kernel from \(\calX\) to \(\calY\).
\[\begin{align*} \bbD_f(W\circ P\Vert W\circ Q) \leq \bbD_f(W\cdot P\Vert W\cdot Q) = \bbD_f(P\Vert Q). \end{align*}\]
The \(f\)-divergence \(\bbD:\Delta(\calX)\times\Delta(\calX)\to\bbR^+:(p,q)\mapsto\mathbb{D}_f(P\Vert Q)\) is jointly convex, i.e., for all \(P_1,Q_1,P_2,Q_2\in\Delta(\calX)\) and \(\lambda\in[0;1]\)
\[\begin{align*} \mathbb{D}_f\left(\lambda P_1+(1-\lambda) P_2\Vert \lambda Q_1+(1-\lambda) Q_2\right)\leq \lambda\mathbb{D}_f(P_1\Vert Q_1)+(1-\lambda)\mathbb{D}_f(P_2\Vert Q_2). \end{align*}\]
For \(P,Q\in\Delta(\calX)\), the total variation between \(P\) and \(Q\) is the \(f\)-divergence for \(f:t\mapsto\abs{t-1}\), i.e.,
\[\begin{align} \mathbb{V}\left(P,Q\right)\eqdef \frac{1}{2}\norm[1]{P-Q}\eqdef \frac{1}{2}\sum_{x\in\calX}\abs{P(x)-Q(x)}. \end{align}\]
For \(P,Q\in\Delta(\calX)\), the relative entropy, also called Kullback-Leibler divergence, between \(P\) and \(Q\) is the \(f\)-divergence for \(f:t\mapsto t\ln t\), i.e.,
\[\begin{align} \bbD(P\Vert Q)\eqdef \sum_{x\in\calX}P(x)\ln\frac{P(x)}{Q(x)}, \end{align}\]
with the convention that \(\bbD_f(P\Vert Q) = \infty\) if \(P\) is not absolutely continuous with respect to \(Q\).
The only function that satisfies the axioms is \(H(X) = -\sum_xP_X(x)\log_2 P_X(x)\).
Let \(X\in\calX\) be a discrete random variable with \(\card{\calX}<\infty\). The Shannon entropy of \(X\) is defined as
\[\begin{align} \mathbb{H}(X) \eqdef \E[X]{-\log_2 P_X(X)} = - \sum_{x \in {\calX}} P_X(x) \log_2 P_X(x), \end{align}\]
with the convention that \(0\ln 0 = 0\).
The entropy \(H:\Delta(\calX)\to\bbR:P\mapsto H(P)\) is a concave function.
Let \(X\in\calX\) be a discrete random variable. Then \(\mathbb{H}(X) \geq 0\) with equality iff \(X\) is a constant, i.e., there exists \(x^*\in\calX\) such that \(\P{X=x^*}=1\).
Let \(X \in {\calX}\) be a discrete random variable. Then \(\mathbb{H}(X)\leq \ln \card{\calX}\) with equality if and only if \(X\) is uniform over \({\calX}\), i.e., \(\forall x \in {\calX}, P_X(x) = \frac{1}{\card{\calX}}\).
The joint entropy of two discrete random variables \(X\in\calX\) and \(Y\in\calY\) with joint PMF \(P_{XY}\) is
\[\begin{align*} \mathbb{H}(X,Y) \eqdef \mathbb{E}_{XY}(-\log_2 P_{XY}(X,Y)) = - \sum_{x \in {\calX}} \sum_{y \in \mathcal{Y}}P_{XY}(x, y) \log_2 P_{XY}(x, y). \end{align*}\]
Furthermore, the conditional entropy \(Y\) given \(X\) is
\[\begin{align*} \mathbb{H}(Y|X) \eqdef \mathbb{E}_{XY}(-\log_2 P_{Y|X}(Y|X)) = - \sum_{x \in {\calX}} \sum_{y \in \mathcal{Y}} P_{XY}(x,y) \log_2 P_{Y|X}(y|x). \end{align*}\]
Let \(X, Y\) be discrete random variables with joint PMF \(p_{XY}\). Then \(\mathbb{H}(Y|X) \geq 0\) with equality if and only if \(Y\) is a function of \(X\).
Let \(X\) and \(Y\) be discrete random variables with joint PMF \(p_{XY}\). Then \(\mathbb{H}(X|Y) \leq \mathbb{H}(X)\) with equality if and only if \(X\) is independent of \(Y\).
Let \(X\), \(Y\), and \(Z\) be discrete random variable with joint PMF \(p_{XYZ}\). Then \[ \mathbb{H}(XY|Z) = \mathbb{H}(X|Z) + \mathbb{H}(Y|XZ) \notag = \mathbb{H}(Y|Z) + \mathbb{H}(X|YZ). \] More generally, if \(\bfX\eqdef\set{X_i}_{i=1}^n\) and \(Z\) are jointly distributed random variables we have \[ \mathbb{H}(\bfX|Z) = \sum_{i=1}^n\mathbb{H}(X_i|X^{1:i-1}Z) \] with the convention \(X^{1:0}\eqdef \emptyset\).
Let \(X, Y\) be two random variables with joint PMF \(P_{XY}\). The mutual information between \(X\) and \(Y\) is \[ \bbI(X;Y)\eqdef \bbH(X)-\bbH(X|Y)=\bbD(P_{XY}\Vert P_XP_Y). \]
If \(\Delta(\calX\to\calY)\) denotes the set of kernels from \(\calX\) to \(\calY\) then the function \[ I:\Delta(\calX)\times \Delta(\calX\to\calY):(P,W)\mapsto I(P;W)\eqdef \mathbb{D}(W\cdot P\Vert (W\circ P)P) \] is a concave function of \(P\) and a convex function of \(W\).
Let \(X, Y, Z\) be discrete random variables with joint distribution \(p_{XYZ}\). The conditional mutual information between \(X\) and \(Y\) given \(Z\) is \(\mathbb{I}(X ; Y|Z) \eqdef \mathbb{H}(X|Z) - \mathbb{H}(X|YZ)\).
\(\mathbb{I}(X;Y|Z)\geq 0\) if and only if \(X\) and \(Y\) are conditionally independent given \(Z\).
Let \(\bfX\eqdef\set{X_i}_{i=1}^n\), \(Y\), and \(Z\) be jointly distributed random variables. Then,
\[\begin{align*} \mathbb{I}(\bfX;Y|Z) = \sum_{i=1}^n\mathbb{I}(X_i;Y|Z\bfX^{1:i-1})\textsf{ with the convention }\bfX^{1:0}=\emptyset. \end{align*}\]
Let \(X\), \(Y\), and \(Z\) be discrete random variables such that \(X \rightarrow Y \rightarrow Z\). Then \(\mathbb{I}(X;Y) \geq \mathbb{I}(X;Z)\) or, equivalently, \(\mathbb{H}(X|Z) \geq \mathbb{H}(X|Y)\).
Let \(X\) be a discrete random variable with alphabet \({\calX}\). Let \(\widehat{X}\) be an estimate of \(X\), \(\widehat{X} \in {\calX}\) and with joint distribution \(p_{X\widehat{X}}\). We define the probability of estimation error: \(P_e \eqdef \mathbb{P}[X\neq\widehat{X}]\). Then, \(\mathbb{H}(X|\widehat{X}) \leq \mathbb{H}_b(P_e) + P_e \log(\card{\calX}-1)\).
Let \(X\in\calX\) be a discrete random variable. The R\'enyi entropy of order \(\alpha\in\mathbb{R}\cup\{\infty\}\) of \(X\) is
\[\begin{align*} H_{\alpha}(X)\eqdef \frac{1}{1-\alpha}\log \sum_{x\in\calX} p_X(x)^\alpha, \end{align*}\]
with the convention that \(H_1(X)=\lim_{\alpha\rightarrow 1} H_\alpha(X)\) and \(H_\infty(X)=\lim_{\alpha\rightarrow \infty} H_\alpha(X)\).
Let \(X\in\calX\) be a discrete random variable. Then,